Exploration of spatial and temporal patterns in abundance, and bodymass of fishes from the Northeast groundfish survey. Build code containing data wrangling and conversions can be accessed here.
# Do some formatting
weights_20 <- weights_20 %>%
mutate(
id = as.character(id),
season = ifelse(season %in% c("SPRING", "Spring"), "Spring", "Fall"),
season = factor(season, levels = c("Spring", "Fall"))
)
# Run Summary Functions
ann_means <- ss_annual_summary(weights_20)
seasonals <- ss_seasonal_summary(weights_20)
# bind them so you can facet
summs <- bind_rows(ann_means, seasonals) %>%
mutate(season = factor(season, levels = c("Spring", "Fall", "Spring + Fall")))
#drop haddock to see if that changes the bump
haddock_ann <- ss_annual_summary(filter(weights_20, comname != "haddock"))
haddock_seas <- ss_seasonal_summary(filter(weights_20, comname != "haddock"))
no_haddock <- bind_rows(haddock_ann, haddock_seas) %>%
mutate(season = factor(season, levels = c("Spring", "Fall", "Spring + Fall")))
Patterns in annual abundance or CPUE of High Profile Species at annual and/or seasonal levels.
Biomass of a given species from a station is a function of the adjusted number caught, their lengths, and the corresponding weights at-length using the species specific length-weight relationship coefficients. sex-specific coefficients and seasonal growth coefficients are used when available.
Total biomass in the following plots represent the total of all individual biomass estimates from all species with available growth coefficients, for each year and within each survey season.
bio_1 <- ann_means %>%
ggplot(aes(est_year, lw_biomass_kg, color = season, linetype = "All Species")) +
geom_line(group = 1) +
geom_line(data = haddock_ann,
aes(est_year, lw_biomass_kg, color = season, linetype = "Haddock Removed")) +
scale_y_continuous(labels = scales::comma_format()) +
scale_color_manual(values = "black") +
labs(x = "", y = "Total Biomass (kg)")
bio_2 <- seasonals %>%
ggplot(aes(est_year, lw_biomass_kg, color = season)) +
geom_line() +
scale_y_continuous(labels = scales::comma_format()) +
labs(x = "", y = "Total Biomass (kg)")
bio_1 / bio_2
ship_1 <- ann_means %>%
ggplot(aes(est_year, fscs_biomass_kg, color = season, linetype = "All Species")) +
geom_line(group = 1) +
geom_line(data = haddock_ann,
aes(est_year, fscs_biomass_kg, color = season, linetype = "Haddock Removed")) +
scale_color_manual(values = "black") +
scale_y_continuous(labels = scales::comma_format()) +
labs(x = "", y = "Total Biomass (kg)")
ship_2 <- seasonals %>%
ggplot(aes(est_year, fscs_biomass_kg, color = season)) +
geom_line() +
scale_y_continuous(labels = scales::comma_format()) +
labs(x = "", y = "Total Biomass (kg)")
ship_1 / ship_2
For area stratified weight in kg I currently have these formulas:
The stratum area ratio is the area of strata: \(1,2,3, ...s\) relative to the total area of the survey (only stratum with data we use is included.)
\[stratio_s = area(stratum_s)~ / total~area\]
Next, using the amount of tows in a given year for a specific strata, we get the amount of biomass (kg) across that group and divide it by the effort. This is then scaled by \(stratio\) to scale it by how much cpue for that stratum reflects the total area of the surey.
For Strata: \(1,2,3, ...s\)
For years: \(1,2,3, ...i\)
\[stratum~weighted~cpue = stratio_s * (~sum(biomass_{is})~/~n~stations_{is}~),\]
Then that weighted cpue is transformed back to kg. The stratum weighted catch per station is divided by .01 (nautical miles conversion), then multiplied by the total area of all strata. Finally that value is then then we take the log base ten of that to return the biomass?
\[stratum~weighted~biomass = log10(~~(stratum~weighted~cpue~/~.01)~ *~total~area~~)\]
lw_strat_1 <- ann_means %>%
ggplot(aes(est_year, lw_strat_biomass, color = season, linetype = "All Species")) +
geom_line(group = 1) +
scale_color_manual(values = "black") +
scale_y_continuous(labels = scales::comma_format()) +
labs(x = "", y = "Stratum Weighted Biomass \n (l-W Regressions)")
lw_strat_2 <- seasonals %>%
ggplot(aes(est_year, lw_strat_biomass, color = season)) +
geom_line() +
scale_y_continuous(labels = scales::comma_format()) +
labs(x = "",
y = "Stratum Weighted Biomass \n (L-W Regressions)")
lw_strat_1 / lw_strat_2
strat_1 <- ann_means %>%
ggplot(aes(est_year, fscs_strat_biomass, color = season, linetype = "All Species")) +
geom_line(group = 1) +
scale_color_manual(values = "black") +
scale_y_continuous(labels = scales::comma_format()) +
labs(x = "", y = "Stratum Weighted Biomass \n (Gross Shipboard Weight)")
strat_2 <- seasonals %>%
ggplot(aes(est_year, fscs_strat_biomass, color = season)) +
geom_line() +
scale_y_continuous(labels = scales::comma_format()) +
labs(x = "",
y = "Stratum Weighted Biomass \n (Gross Shipboard Weight)",
caption = "Stratum Weighted Biomass = Total Stratum Biomass (kg) * (stratum area / total stratum area)")
strat_1 / strat_2
Biomass per unit effort in the following figures represents the total biomass divided by the effort in number of stations. Adjustments for the 2008 vessel change are assumed to be corrected for in the data. There has been no correction for differences in stratum area in these figures, only station effort.
cpue_1 <- ann_means %>%
ggplot(aes(est_year, lw_biomass_per_station, color = season, linetype = "All Species")) +
geom_line(group = 1) +
geom_line(data = haddock_ann,
aes(est_year, lw_biomass_per_station, color = season, linetype = "Haddock Removed")) +
scale_color_manual(values = "black") +
labs(x = "", y = "Adjusted Biomass per Station (kg)")
cpue_2<- seasonals %>%
ggplot(aes(est_year, lw_biomass_per_station, color = season)) +
geom_line() +
labs(x = "", y = "Adjusted Biomass per Station (kg)")
cpue_1 / cpue_2
Relative contribution to overall biomass by species group based on management and/or life history: Demersal, Elasmobranch, Groundfish, Pelagic
# Species Class Bins Data Prep
class_totals <- weights_20 %>%
group_by(est_year, spec_class) %>%
summarise(`Total Class Biomass` = sum(sum_weight_kg)) %>%
ungroup() %>%
left_join(ann_means, by = "est_year") %>%
mutate(`Percent of Annual Biomass` = (`Total Class Biomass` / lw_biomass_kg) * 100 ,
`Species Category` = spec_class,
Year = as.character(est_year))
#2008
# Class Proportions
class_totals %>%
ggplot(aes(x = fct_rev(Year), y = `Percent of Annual Biomass`, fill = `Species Category`)) +
geom_col(color = "white", size = 0.2) +
geom_vline(xintercept = "2008", linetype = 2, color = "gray25") +
labs(x = "", y = "Proportion of Total Annual Biomass") +
coord_flip() +
scale_fill_gmri() +
scale_y_continuous(position = "right", ) +
guides("fill" = guide_legend(title.position = "top", title.hjust = 0.5)) +
theme(legend.position = "bottom")
guild_plots <- weights_20 %>%
split(.$spec_class) %>%
imap(function(guild, guild_name){
year_df <- guild %>%
split(.$est_year) %>%
imap_dfr(function(year_group, year){
#quantile breaks
probs <- c(0.5, 0.25, 0.5, 0.75, 0.95 )
# length quantiles
length_quantiles <- Hmisc::wtd.quantile(
x = year_group$length,
weights = year_group$numlen_adj,
probs = ) %>%
as.data.frame() %>%
rownames_to_column() %>%
setNames(c("probability", "length")) %>%
mutate(probability = factor(probs),
year = year)
# weight quantiles
weight_quantiles <- Hmisc::wtd.quantile(
x = year_group$ind_weight_kg,
weights = year_group$numlen_adj,
probs = probs) %>%
as.data.frame() %>%
rownames_to_column() %>%
setNames(c("probability", "weight (kg)")) %>%
mutate(probability = factor(probs),
year = year)
guild_data <- left_join(length_quantiles, weight_quantiles,
by = c("year", "probability")) %>%
mutate(year = fct_rev(year))
return(guild_data)
})
#return(year_df)
guild_key <- c("dem" = "Demersals",
"el" = "Elasmobranchs",
"gf" = "Groundfish",
"pel" = "Pelagics")
#make plot
p1 <- ggplot(year_df, aes(length, fct_rev(year))) +
geom_rect(xmin = -Inf,
xmax = Inf,
ymin = "2008",
ymax = "2019",
fill = "gray90") +
geom_line(aes(group = year), color = "gray50", alpha = 0.5) +
geom_point(aes(color = probability)) +
scale_color_gmri() +
labs(subtitle = guild_key[guild_name], x = "Individual Length (cm)", y = NULL) +
theme(legend.position = "none")
#make plot
p2 <- ggplot(year_df, aes(`weight (kg)`, fct_rev(year))) +
geom_rect(xmin = -Inf,
xmax = Inf,
ymin = "2008",
ymax = "2019",
fill = "gray90") +
geom_line(aes(group = year), color = "gray50", alpha = 0.5) +
geom_point(aes(color = probability)) +
scale_color_gmri() +
labs(x = "Individual Weight (kg)", y = NULL) +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank())
patch_plot <- p1 + p2
return(patch_plot)
})
guild_plots$pel
# Size Plots
guild_plots$dem
guild_plots$el
guild_plots$gf
Check these across years and also by season
summs %>%
ggplot(aes(est_year, mean_ind_bodymass_lw * 1000, color = season)) +
geom_line(show.legend = FALSE) +
scale_color_manual(values = c(
"Spring + Fall" = "black",
"Spring" = as.character(gmri_cols("gmri blue")),
"Fall" = as.character(gmri_cols("orange")))) +
#theme(legend.position = "bottom", legend.title = element_blank()) +
labs(x = "", y = "Mean Individual Biomass (g)") +
facet_wrap(~season, ncol = 1)
summs %>%
ggplot(aes(est_year, mean_ind_length, color = season)) +
geom_line(show.legend = FALSE) +
scale_color_manual(values = c(
"Spring + Fall" = "black",
"Spring" = as.character(gmri_cols("gmri blue")),
"Fall" = as.character(gmri_cols("orange")))) +
#theme(legend.position = "bottom", legend.title = element_blank()) +
labs(x = "", y = "Mean Individual Length (cm)") +
facet_wrap(~season, ncol = 1)
Since 2008 there has been a large increase in the amount of total biomass caught and the adjusted CPUE. As a way to look into the top species contributing to this I have pulled out the top 5 species for thbat year by total biomass caught.
# Pull top 5 species from each year, by biomass
top_5 <- weights_20 %>%
filter(est_year >= 2004) %>%
group_by(est_year, comname) %>%
summarise(n_fish = sum(numlen_adj),
lw_biomass = sum(sum_weight_kg)) %>%
ungroup() %>%
split(.$est_year) %>%
map_dfr(function(yr_group){
yr_group %>%
slice_max(order_by = lw_biomass, n = 5)
})
# Plot each year, re-ordered by biomass
t5_plots <- top_5 %>%
split(.$est_year) %>%
map(function(x){
x %>% mutate(comname = fct_drop(comname),
comname = fct_reorder(comname, .fun = max, lw_biomass)) %>%
ggplot(aes(x = lw_biomass, y = comname)) +
geom_point() +
geom_segment(aes(yend = comname, x = 0, xend = lw_biomass)) +
facet_wrap(~est_year) +
scale_x_continuous(labels = scales::comma_format()) +
labs(x = "Total Biomass (kg)", y = "")
})
t1 <- ((t5_plots$`2004` | t5_plots$`2005`) / (t5_plots$`2006` | t5_plots$`2007`))
t2 <- ((t5_plots$`2008` | t5_plots$`2009`) / (t5_plots$`2010` | t5_plots$`2011`))
t3 <- ((t5_plots$`2012` | t5_plots$`2013`) / (t5_plots$`2014` | t5_plots$`2015`))
t4 <- ((t5_plots$`2016` | t5_plots$`2017`) / (t5_plots$`2018` | t5_plots$`2019`))
t1
t2
t3
t4
Bodymass size cutoffs are used to plot the timelines of total fish biomass below the following thresholds for individual biomass: .0625, .125, .25, .5, 1kg. This is going to fail for the shipboard weights because that weight is divided equally across different measurements of fishes regardless of size.
max_weight_plot <- function(size_cutoff){
weights_20 %>%
filter(ind_weight_kg < size_cutoff) %>%
group_by(est_year) %>%
summarise(total_biomass_lw = sum(sum_weight_kg),
`Maximum Individual Biomass` = str_c("Individual Mass < ", size_cutoff, " kg")) %>%
ggplot(aes(est_year, total_biomass_lw, color = `Maximum Individual Biomass`)) +
geom_line() +
scale_color_gmri() +
scale_y_continuous(labels = scales::comma_format()) +
labs(x = NULL, y = "Total Annual (kg)") +
theme(legend.position = "bottom")
}
# .5 kg
max_weight_plot(.0625)
max_weight_plot(.125)
max_weight_plot(.25)
max_weight_plot(.5)
max_weight_plot(1)
Same idea for the larger fish, but the opposite. A minimum cutoff for the individual bodymass. Below are the upper quantiles for individual body mass weighted by number at length.
# # Weighted Quantiles for Individual weights
# Hmisc::wtd.quantile(x = weights_20$ind_weight_kg, weights = weights_20$numlen_adj, probs = c(.75, 0.8, 0.9, 0.95)) %>%
# as.data.frame() %>% round(digits = 2) %>%
# knitr::kable()
# plotting function
min_weight_plot <- function(size_cutoff){
weights_20 %>%
filter(ind_weight_kg > size_cutoff) %>%
group_by(est_year) %>%
summarise(total_biomass_lw = sum(sum_weight_kg),
`Minimum Individual Biomass` = str_c("Individual Mass > ", size_cutoff, " kg")) %>%
ggplot(aes(est_year, total_biomass_lw, color = `Minimum Individual Biomass`)) +
geom_line() +
scale_color_gmri() +
scale_y_continuous(labels = scales::comma_format()) +
labs(x = NULL, y = "Total Annual (kg)") +
theme(legend.position = "bottom")
}
min_weight_plot(.25)
min_weight_plot(.5)
min_weight_plot(1)
min_weight_plot(2)
min_weight_plot(5)
min_weight_plot(10)
For large regions like Georges Bank and the Gulf of Maine, what kind of patterns are we seeing.
# Load the strata
survey_strata <- read_sf(str_c(res_path, "Shapefiles/BottomTrawlStrata/BTS_Strata.shp")) %>%
clean_names() %>%
filter(strata >= 01010 ,
strata <= 01760,
strata != 1310,
strata != 1320,
strata != 1330,
strata != 1350,
strata != 1410,
strata != 1420,
strata != 1490)
# Key to which strata = which regions
strata_key <- list(
"Georges Bank" = as.character(13:23),
"Gulf of Maine" = as.character(24:40),
"Southern New England" = str_pad(as.character(1:12), width = 2, pad = "0", side = "left"),
"Mid-Atlantic Bight" = as.character(61:76))
# Assign Areas
survey_strata <- survey_strata %>%
mutate(
strata = str_pad(strata, width = 5, pad = "0", side = "left"),
strata_num = str_sub(strata, 3, 4),
area = case_when(
strata_num %in% strata_key$`Georges Bank` ~ "Georges Bank",
strata_num %in% strata_key$`Gulf of Maine` ~ "Gulf of Maine",
strata_num %in% strata_key$`Southern New England` ~ "Southern New England",
strata_num %in% strata_key$`Mid-Atlantic Bight` ~ "Mid-Atlantic Bight",
TRUE ~ "Outside Major Study Areas"
)) %>%
select(finstr_id, strata, strata_num, area, a2, str2, set, stratuma, str3, geometry)
# Load new england
new_england <- ne_states("united states of america") %>% st_as_sf(crs = 4326)
canada <- ne_states("canada") %>% st_as_sf(crs = 4326)
# Make trawl data an sf dataset
trawl_sf <- weights_20 %>% st_as_sf(coords = c("decdeg_beglon", "decdeg_beglat"), crs = 4326)
# Plot to check
ggplot() +
geom_sf(data = new_england) +
geom_sf(data = canada) +
geom_sf(data = survey_strata, aes(fill = area)) +
coord_sf(xlim = c(-77, -65.5), ylim = c(34, 45.75), expand = FALSE) +
guides(fill = guide_legend(nrow = 2)) +
theme_bw() +
theme(legend.position = "bottom", legend.title = element_blank())
# Just Area, all seasona
area_summs <- weights_20 %>%
group_by(survey_area) %>%
summarise(
season = "Spring + Fall",
lw_biomass_kg = sum(sum_weight_kg, na.rm = T),
n_stations = n_distinct(id),
lw_biomass_per_station = lw_biomass_kg / n_stations,
mean_ind_bodymass = weighted.mean(ind_weight_kg, weights = numlen_adj),
mean_ind_length = weighted.mean(length, weights = numlen_adj)
)
# Area x Season
seas_area <- weights_20 %>%
group_by(survey_area, season) %>%
summarise(
lw_biomass_kg = sum(sum_weight_kg, na.rm = T),
n_stations = n_distinct(id),
lw_biomass_per_station = lw_biomass_kg / n_stations,
mean_ind_bodymass = weighted.mean(ind_weight_kg, weights = numlen_adj),
mean_ind_length = weighted.mean(length, weights = numlen_adj)
)
# Combine those two
summs_combined <- bind_rows(area_summs, seas_area) %>%
mutate(season = factor(season, levels = c("Spring", "Fall", "Spring + Fall")))
summs_combined %>%
mutate_if(is.numeric,round, 2) %>%
arrange(survey_area,season) %>%
knitr::kable()
| survey_area | season | lw_biomass_kg | n_stations | lw_biomass_per_station | mean_ind_bodymass | mean_ind_length |
|---|---|---|---|---|---|---|
| GB | Spring | 4183904 | 2607 | 1604.87 | 0.86 | 39.56 |
| GB | Fall | 9044802 | 2343 | 3860.35 | 0.70 | 37.14 |
| GB | Spring + Fall | 13228706 | 4950 | 2672.47 | 0.78 | 38.36 |
| GoM | Spring | 2779467 | 3005 | 924.95 | 0.63 | 34.55 |
| GoM | Fall | 7022811 | 2952 | 2379.00 | 0.68 | 35.81 |
| GoM | Spring + Fall | 9802278 | 5957 | 1645.51 | 0.65 | 35.20 |
| MAB | Spring | 5703280 | 1995 | 2858.79 | 0.84 | 44.38 |
| MAB | Fall | 1176131 | 1925 | 610.98 | 0.65 | 25.67 |
| MAB | Spring + Fall | 6879411 | 3920 | 1754.95 | 0.78 | 38.44 |
| SNE | Spring | 3256620 | 2237 | 1455.80 | 0.56 | 36.25 |
| SNE | Fall | 1736036 | 2093 | 829.45 | 0.49 | 33.04 |
| SNE | Spring + Fall | 4992656 | 4330 | 1153.04 | 0.53 | 34.99 |
# Year x Area
area_summs_y <- weights_20 %>%
group_by(est_year, survey_area) %>%
summarise(
season = "Spring + Fall",
lw_biomass_kg = sum(sum_weight_kg, na.rm = T),
n_stations = n_distinct(id),
lw_biomass_per_station = lw_biomass_kg / n_stations,
mean_ind_bodymass = weighted.mean(ind_weight_kg, weights = numlen_adj),
mean_ind_length = weighted.mean(length, weights = numlen_adj)
)
area_summs_y %>%
ggplot(aes(est_year, lw_biomass_kg)) +
geom_line() +
facet_wrap(~survey_area, ncol = 2) +
scale_y_continuous(labels = scales::comma_format()) +
labs(x = "", y = "Total Biomass (kg)")
area_summs_y %>%
ggplot(aes(est_year, lw_biomass_per_station)) +
geom_line() +
facet_wrap(~survey_area, ncol = 2) +
labs(x = "", y = "Adjusted Biomass per Station (kg)")
Concerns have been raised that the datasets obtained through the NEFSC are inconsistent in some areas over time. The following plots seek to identify differences in a dataset obtained in 2016 from what we currently are exploring with the 2020 dataset.
Each file was processed for size-spectra analysis using the same processing steps. This includes the same species codes, the same abundance and stratification adjustments, and the same L-W derived biomasses.
weights_16 <- read_csv(here::here("data/ss_prepped_data/survdat_2016_ss.csv"),
col_types = cols(),
guess_max = 1e5)
weights_19 <- read_csv(here::here("data/ss_prepped_data/survdat_2019_ss.csv"),
col_types = cols(),
guess_max = 1e5)
weights_20 <- read_csv(here::here("data/ss_prepped_data/survdat_2020_ss.csv"),
col_types = cols(),
guess_max = 1e5)
# run summaries
summ_16 <- ss_regional_differences(weights_16) %>% mutate(source = "2016")
summ_19 <- ss_regional_differences(weights_19) %>% mutate(source = "2019")
summ_20 <- ss_regional_differences(weights_20) %>% mutate(source = "2020")
reg_summs <- bind_rows(list(summ_16, summ_19, summ_20))
# Total Biomass
p1 <- reg_summs %>%
ggplot(aes(est_year, lw_biomass_kg, color = source)) +
geom_line(show.legend = F) +
scale_y_continuous(labels = scales::comma_format()) +
facet_wrap(~survey_area, ncol = 1, scales = "free") +
labs(x = "", y = "Total Biomass \n (L-W Regressions)")
# Total Biomass - FSCS
p2 <- reg_summs %>%
ggplot(aes(est_year, fscs_biomass_kg, color = source)) +
geom_line() +
scale_y_continuous(labels = scales::comma_format()) +
facet_wrap(~survey_area, ncol = 1) +
labs(x = "", y = "Total Biomass \n (FSCS Haul Weights)")
# effort
p3 <- reg_summs %>%
ggplot(aes(est_year, n_stations, color = source)) +
geom_line(show.legend = F) +
scale_y_continuous(labels = scales::comma_format()) +
facet_wrap(~survey_area, ncol = 1) +
labs(x = "", y = "Effort (haul count)")
# Species
p4 <- reg_summs %>%
ggplot(aes(est_year, n_species, color = source)) +
geom_line(show.legend = F) +
scale_y_continuous(labels = scales::comma_format()) +
facet_wrap(~survey_area, ncol = 1) +
labs(x = "", y = "Distinct Species")
p1 + p2 + p3 + p4
A work by Adam A. Kemberling
Akemberling@gmri.org